new image
James Webb and Hubble Space Telescopes snap images of same nebula, 10 years apart
The two images of Westerlund 2 show just how far the technology has come. Astronomers are studying the hundreds of young, brown dwarf stars inside the stellar nursery. Breakthroughs, discoveries, and DIY tips sent every weekday. In 2015, NASA celebrated the Hubble Space Telescope's 25th year in orbit by releasing one of its most stunning images to date--a colorful star cluster in the constellation Carina known as Westerlund 2 . However, a lot can change in a decade.
- Government > Space Agency (0.39)
- Government > Regional Government > North America Government > United States Government (0.39)
Red Spider Nebula glows in ethereal new JWST image
This new James Webb Space Telescope image features a cosmic creepy-crawly called NGC 6537-the Red Spider Nebula. Using its Near-InfraRed Camera (NIRCam), JWST has revealed never-before-seen details in this picturesque planetary nebula with a rich backdrop of thousands of stars. Breakthroughs, discoveries, and DIY tips sent every weekday. A cosmic spider was caught in some kind of web. The telescope's sophisticated Near-InfraRed Camera (NIRCam) revealed some never-before-seen details of NGC 6537, aka the Red Spider Nebula.
- North America > United States (0.04)
- Asia > Middle East > Israel (0.04)
Textual interpretation of transient image classifications from large language models
Stoppa, Fiorenzo, Bulmus, Turan, Bloemen, Steven, Smartt, Stephen J., Groot, Paul J., Vreeswijk, Paul, Smith, Ken W.
Modern astronomical surveys deliver immense volumes of transient detections, yet distinguishing real astrophysical signals (for example, explosive events) from bogus imaging artefacts remains a challenge. Convolutional neural networks are effectively used for real versus bogus classification; however, their reliance on opaque latent representations hinders interpretability. Here we show that large language models (LLMs) can approach the performance level of a convolutional neural network on three optical transient survey datasets (Pan-STARRS, MeerLICHT and ATLAS) while simultaneously producing direct, human-readable descriptions for every candidate. Using only 15 examples and concise instructions, Google's LLM, Gemini, achieves a 93% average accuracy across datasets that span a range of resolution and pixel scales. We also show that a second LLM can assess the coherence of the output of the first model, enabling iterative refinement by identifying problematic cases. This framework allows users to define the desired classification behaviour through natural language and examples, bypassing traditional training pipelines. Furthermore, by generating textual descriptions of observed features, LLMs enable users to query classifications as if navigating an annotated catalogue, rather than deciphering abstract latent spaces. As next-generation telescopes and surveys further increase the amount of data available, LLM-based classification could help bridge the gap between automated detection and transparent, human-level understanding.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Africa > South Africa > Western Cape > Cape Town (0.04)
- Europe > Netherlands > Gelderland > Nijmegen (0.04)
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning
Wang, Ke, Pan, Junting, Wei, Linda, Zhou, Aojun, Shi, Weikang, Lu, Zimu, Xiao, Han, Yang, Yunqiao, Ren, Houxing, Zhan, Mingjie, Li, Hongsheng
Natural language image-caption datasets, widely used for training Large Multimodal Models, mainly focus on natural scenarios and overlook the intricate details of mathematical figures that are critical for problem-solving, hindering the advancement of current LMMs in multimodal mathematical reasoning. To this end, we propose leveraging code as supervision for cross-modal alignment, since code inherently encodes all information needed to generate corresponding figures, establishing a precise connection between the two modalities. Specifically, we co-develop our image-to-code model and dataset with model-in-the-loop approach, resulting in an image-to-code model, FigCodifier and ImgCode-8.6M dataset, the largest image-code dataset to date. Furthermore, we utilize FigCodifier to synthesize novel mathematical figures and then construct MM-MathInstruct-3M, a high-quality multimodal math instruction fine-tuning dataset. Finally, we present MathCoder-VL, trained with ImgCode-8.6M for cross-modal alignment and subsequently fine-tuned on MM-MathInstruct-3M for multimodal math problem solving. Our model achieves a new open-source SOTA across all six metrics. Notably, it surpasses GPT-4o and Claude 3.5 Sonnet in the geometry problem-solving subset of MathVista, achieving improvements of 8.9% and 9.2%. The dataset and models will be released at https://github.com/mathllm/MathCoder.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Monaco (0.04)
- Asia > China > Hong Kong (0.04)
Apple will use its street view Maps photos to train AI models
Apple plans to start using images it collects for Maps to train its AI models. In a disclosure spotted by 9to5Mac, the company said starting this month it would use images it captures to provide its Look Around feature for the additional purpose of training some of its generative AI models. Look Around is Apple's answer to Google Street View. The company originally released the feature alongside its 2019 revamp of Apple Maps. The tool allows users to see locations from ground level.
Conquering images and the basis of transformative action
Our rapid immersion into online life has made us all ill. Through the generation, personalization, and dissemination of enchanting imagery, artificial technologies commodify the minds and hearts of the masses with nauseating precision and scale. Online networks, artificial intelligence (AI), social media, and digital news feeds fine-tune our beliefs and pursuits by establishing narratives that subdivide and polarize our communities and identities. Meanwhile those commanding these technologies conquer the final frontiers of our interior lives, social relations, earth, and cosmos. In the Attention Economy, our agency is restricted and our vitality is depleted for their narcissistic pursuits and pleasures. Generative AI empowers the forces that homogenize and eradicate life, not through some stupid "singularity" event, but through devaluing human creativity, labor, and social life. Using a fractured lens, we will examine how narratives and networks influence us on mental, social, and algorithmic levels. We will discuss how atomizing imagery -- ideals and pursuits that alienate, rather than invigorate the individual -- hijack people's agency to sustain the forces that destroy them. We will discover how empires build digital networks that optimize society and embolden narcissists to enforce social binaries that perpetuate the ceaseless expansion of consumption, exploitation, and hierarchy. Structural hierarchy in the world is reified through hierarchy in our beliefs and thinking. Only by seeing images as images and appreciating the similarity shared by opposing narratives can we facilitate transformative action and break away from the militaristic systems plaguing our lives.
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- North America > United States > Nebraska (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Media (1.00)
- Government (1.00)
- Information Technology (0.94)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.47)
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations
Qi, Ji, Ding, Ming, Wang, Weihan, Bai, Yushi, Lv, Qingsong, Hong, Wenyi, Xu, Bin, Hou, Lei, Li, Juanzi, Dong, Yuxiao, Tang, Jie
Vision-Language Models (VLMs) have demonstrated their widespread viability thanks to extensive training in aligning visual instructions to answers. However, this conclusive alignment leads models to ignore critical visual reasoning, and further result in failures on meticulous visual problems and unfaithful responses. In this paper, we propose Chain of Manipulations, a mechanism that enables VLMs to solve problems with a series of manipulations, where each manipulation refers to an operation on the visual input, either from intrinsic abilities (e.g., grounding) acquired through prior training or from imitating human-like behaviors (e.g., zoom in). This mechanism encourages VLMs to generate faithful responses with evidential visual reasoning, and permits users to trace error causes in the interpretable paths. We thus train CogCoM, a general 17B VLM with a memory-based compatible architecture endowed this reasoning mechanism. Experiments show that our model achieves the state-of-the-art performance across 8 benchmarks from 3 categories, and a limited number of training steps with the data swiftly gains a competitive performance. The code and data are publicly available at https://github.com/THUDM/CogCoM.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)
The AI-Generated Child Abuse Nightmare Is Here
A horrific new era of ultrarealistic, AI-generated, child sexual abuse images is now underway, experts warn. Offenders are using downloadable open source generative AI models, which can produce images, to devastating effects. The technology is being used to create hundreds of new images of children who have previously been abused. Offenders are sharing datasets of abuse images that can be used to customize AI models, and they're starting to sell monthly subscriptions to AI-generated child sexual abuse material (CSAM). The details of how the technology is being abused are included in a new, wide-ranging report released by the Internet Watch Foundation (IWF), a nonprofit based in the UK that scours and removes abuse content from the web.
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Optimizing the AI Development Process by Providing the Best Support Environment
The purpose of this study is to investigate the development process for Artificial inelegance (AI) and machine learning (ML) applications in order to provide the best support environment. The main stages of ML are problem understanding, data management, model building, model deployment and maintenance. This project focuses on investigating the data management stage of ML development and its obstacles as it is the most important stage of machine learning development because the accuracy of the end model is relying on the kind of data fed into the model. The biggest obstacle found on this stage was the lack of sufficient data for model learning, especially in the fields where data is confidential. This project aimed to build and develop a framework for researchers and developers that can help solve the lack of sufficient data during data management stage. The framework utilizes several data augmentation techniques that can be used to generate new data from the original dataset which can improve the overall performance of the ML applications by increasing the quantity and quality of available data to feed the model with the best possible data. The framework was built using python language to perform data augmentation using deep learning advancements.
- Asia (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Arizona (0.04)
- Instructional Material > Course Syllabus & Notes (0.46)
- Research Report > New Finding (0.46)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.67)
- Information Technology > Security & Privacy (0.67)